Amid the Coronavirus Disease pandemic in 2020, governments around the world developed a response to aid the citizens of their countries and mitigate the spread of the Severe Acute Respiratory Syndrome Coronavirus 2. This study aims to understand the relationship between the most recent cumulative number of confirmed cases of COVID-19 in different countries, per 10,000 individuals (on the 23rd of October 2020), with the past government responses in these countries to the outbreak (set on the 15th of June 2020). A model will be constructed to depict this relationship.
The data used in this study is obtained from The Humanitarian Data Exchange data portal and includes the total population for each country in 20191, the Stringency and Economic Support indices on the 15th of June 20202, and the cumulative number of confirmed cases of COVID-19 in different countries3 on the 23rd of October 2020. When comparing the number of infected individuals across countries, the population size of these countries need to be considered. So, the study will look at the cumulative number of confirmed cases of COVID-19 collected on the 23rd of October 2020 in different countries, per 10,000 individuals, and is calculated as: \(\frac{\text{cumulative cases in the country}}{\text{total population of the country}} \cdot 10,000\). The 23rd of October was chosen because it was the most recent date when this study was done.
The continuous variables Stringency Index and Economic Support Index are used to quantify the government response. The former index accounts for closure, containment, and public health measures, and the latter index accounts for the economic response taken by the governments. Two other indices, Government Response Index and Containment and Health Index, were considered instead of the Economic Support Index. However, 9/13 and 9/11 of the variables used to calculate the Government Response Index and Containment and Health Index respectively are in common with all 9 variables used to calculate the Stringency Index, suggesting the presence of high correlation between the indices, which is not ideal. On the other hand, the Stringency Index and the Economic Support Index are calculated with no features in common. 4 Note that the data used provides different government responses for different regions within certain countries, for example the United States of America. Since this study is looking at a country as a whole, the average government response of a country on the 15th of June 2020 will be used, by taking the average government response of all its regions on that day.
When deciding on a day to look at the government response, it was decided to choose a day after April 28 2020, since that is when the Stringency index and the Economic Support index were refined and expanded to give a more accurate measure for the government response.5 Also, the day had to be at least two weeks before the 23rd of October 2020, so that there is time for the response to take effect before looking at its effect on the number of infected individuals. Then, looking at some of the factors that are used to calculate the Stringency and Economic support indices - the income support6 and dept. or contract relief7, and international travel control8 respectively - the countries did not change their response to these variables at all or only slightly within May and July. Having some of the features that are included in the government response stay almost constant for a while allows the response to show its effect on the number of infected individuals more clearly, since the same response has been going on for a while versus looking at a response that changes within a week of its implementation. Thus, we took a day in the middle of the May to July interval: the 15th of June 2020.
After organizing the data, 167 countries remain represented in the dataset, out of the 195 countries in the world (approximately 85%)9.
Table 1.Sample for 5 randomly chosen countries of the data set used in this study| Country | Stringency_Index | Economic_Support_Index | Population2019 | cumulative_confirmed_cases_per_10000 |
|---|---|---|---|---|
| Czech Republic | 41.670 | 62.5 | 10669709 | 223.3641049 |
| Dominica | 72.220 | 75.0 | 71808 | 5.1526292 |
| Romania | 50.930 | 87.5 | 19356544 | 103.8573828 |
| United Kingdom | 71.668 | 100.0 | 66834405 | 124.7875252 |
| Lao People’s Democratic Republic | 36.110 | 62.5 | 7169455 | 0.0334753 |
| n | min | median | mean | max | sd |
|---|---|---|---|---|---|
| 167 | 0.0334753 | 33.83375 | 70.11143 | 523.4503 | 94.78323 |
Our total sample size was 167 (Table 2). The mean cumulative confirmed cases (CCC) per 10,000 is about 70.11, far greater than our median 33.83, indicating that our CCC distribution is heavily right-skewed, which can easily be observed in Figure 1. This is to be expected for the lowest CCC possible is 0 whereas there is no such bound for the highest number. Most countries have their CCC within the 300-mark, we also notice the existence of some very extreme cases (outliers).
Figure 1. Distribution for the cumulative confirmed cases per 10,000 for individual countries
The distribution of the Stringency Index (Figure 2), which measures government response, seems to resemble a bell shape although there is a slight skew on the left tail. The Economic Support Index distribution (Figure 3), which records measures such as income support and debt relief, also seems to be a bit left-skewed. We notice that there are two modes at 50 and 75, but suspect that could be due to rounding.
Figure 2. Distribution for the government response measured by the Stringency Index
Figure 3. Distribution for the government response measured by the Economic Support Index
In figure 4, the scatterplot shows that there seems so be some correlation between the cumulative confirmed cases per 10,000 (CCC) and the Stringency Index, which suggests that, without implying any causal effect, countries with a higher number of cases per 10,000 tend to also have strict policies on pandemic response. It is worth noting that there exist a few outliers (we consider those that pass the 400-mark of CCC) that might have more influence on the best fit line. We also notice that for the cases of (almost) 0 CCC for many countries, the response (Stringency Index) diverses the most (from 0 to 100) compared to other levels, with more points clustering in the [50,75] range. This diversity is also true for Economic Support, which suggests that countries with very low CCC also spend a variety amount on income support and debt relief packages. However, countries that have more CCC definitely tend to spend more on said packages.
Figure 4. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Stringency Index. The red line is the best fit line. The blue curve is the Loess curve.
The scatter plot in Figure 5 for the CCC against Economic Support Index has more points on the bottom and fewer at the top. This implies that countries with lower cases per 10,000 individuals tend to spend less on economic relief packages.
Figure 5. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Economic Support Index. The red line is the best fit line. The blue curve is the Loess curve.
Our initial model is the following:
\[ \begin{aligned}\widehat{Y}_{CCPTTH} =& b_{0} + b_{SI} \cdot (x_1) + b_{ESI} \cdot (x_2) \\ = & -31.3037 + 0.7567 \cdot (x_1) + 1.0102 \cdot (x_2) \end{aligned} \]
Our group intended to use a linear model on the given data, then performed a residual analysis, as an in-sample validation method, to detect any systematic departure from the assumptions upon which the model is built: normality, independence, and homoscedasticity of the residuals. In Figure 6, we are presented with a normal QQ-plot of the residuals, which plots the theoretical quantiles against their observed sample counterparts. The graph presents an upward curve, implying that our data is heavily right-skewed. This is confirmed in Figure 7, showing the histogram of the error terms.
Figure 6. Normal Q-Qplot for the model under discussion
Figure 7. Residuals distribution for the statistical model
Not only that, Figures 8, 9 and 10 present a fanning-out pattern of the residuals, implying that the variance is non-constant, or heteroscedasticity.
Figure 8. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.
Figure 9. Residuals graph for the Stringency Index, with a Lowess curve in blue and a horizontal line at zero in red.
Figure 10. Residuals graph for the Economic Support Index, with a Lowess curve in blue and a horizontal line at zero in red.
Due to the violation of the normality and homoscedasticity assumption mentioned above, we recognize that a transformation is much needed. Using the method of log-likelihood (Figure 11), our dependent variable (CCC) will be transformed by the factor of 0.1818. This factor is positive, thus should not be altering the direction of correlation in our inference later on. Note that in Table 2, our min is 0.033 (and not 0), hence our transformation is valid without having to leave out any y value for any country.
Figure 11. Graph resulting from a Box Cox Test
Comparing the residual graphs (Figure 12 to 16) of the transformed data with what we started with, we observe that the distribution of error terms is fixed to more bell-shaped, the normal Q-Q plot shows an almost straight line, and the residual scatter plot is cloud-shaped (the residuals for Economic support Index is more spread-out). We may conclude that the transformation has allowed our assumptions about the model to be reasonably met in order to proceed with our analysis.
Figure 12. Normal QQplot for the transformed model
Figure 13. Residuals distribution for the transformed statistical model
Figure 14. Residuals against the fitted values of the transformed model, with a Lowess curve in blue and a horizontal line at zero in red.
Figure 15. Residuals graph for the Stringency Index after the transformation, with a Lowess curve in blue and a horizontal line at zero in red.
Figure 16. Residuals graph for the Economic Support Index after the transformation, with a Lowess curve in blue and a horizontal line at zero in red.
To ensure that multicollinearity is not a problem in the transformed model, the VIF values were calculated for the variables in the transformed model. It was found that there is little to no multicollinearity, so the study will proceed with the chosen model transformation.
## Stringency_Index Economic_Support_Index
## 1.00014 1.00014
Table 3. Model Summary Table
##
## Call:
## lm(formula = cumulative_confirmed_cases_per_10000_transf ~ Stringency_Index +
## Economic_Support_Index, data = tidy_joined_dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.29303 -0.36540 -0.01137 0.39995 1.21433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.017725 0.163390 6.229 3.81e-09 ***
## Stringency_Index 0.006525 0.002146 3.040 0.00275 **
## Economic_Support_Index 0.007814 0.001499 5.213 5.54e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5374 on 164 degrees of freedom
## Multiple R-squared: 0.1802, Adjusted R-squared: 0.1702
## F-statistic: 18.02 on 2 and 164 DF, p-value: 8.405e-08
Table 4. The 95% Confidence Intervals
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | 0.6951053 | 1.3403447 |
| Stringency_Index | 0.0022874 | 0.0107619 |
| Economic_Support_Index | 0.0048542 | 0.0107742 |
Our model is the following:
\[ \begin{aligned}\widehat{Y}_{CCPTTH}^{0.182} =& b_{0,t} + b_{SI,t} \cdot (x_1) + b_{ESI,t} \cdot (x_2) \\ = & 1.017725 + 0.006525 \cdot (x_1) + 0.007814 \cdot (x_2) \end{aligned} \]
Using our confidence intervals table (Table 4) output, we are going to test different null hypothesis.
\[\begin{aligned} H_0:&\beta_{0,t} = 0 \\\ \mbox{vs }H_A:& \beta_{0,t} \neq 0 \end{aligned}\]
For the intercept in the transformed model, we find the 95% confidence intervals for it to be [0.6951053, 1.3403447] indicating that it is implausible to be zero at a 95% confidence level. We can also see that the p-value is small at 3.81e-09 which means we can reject the null hypothesis that the intercept is 0 for the alternate hypothesis that it is non-zero and positive. In context, the intercept makes sense, since a country can choose to not give any economic support nor take closure, containment, and public health measures (for the Stringency index), while still having a positive number of cumulative infected individuals by Covid-19.
\[\begin{aligned} H_0:&\beta_{SI,t} = 0 \\\ \mbox{vs }H_A:& \beta_{SI,t} \neq 0 \end{aligned}\]
For the Stringency Index, we find the 95% confidence interval for the rate of change is [0.0022874, 0.0107619] indicating that it is implausible to be zero at a 95% confidence level. We can also see that the p-value is small at 0.00275 which means we can reject the null hypothesis that the slope is 0 for the alternate hypothesis that it is non-zero and positive.
\[\begin{aligned} H_0:&\beta_{ESI,t} = 0 \\\ \mbox{vs }H_A:& \beta_{ESI,t} \neq 0 \end{aligned}\]
For the Economic Support Index, we find the 95% confidence interval for the rate of change is [0.0048542, 0.0107742] indicating that the slope is plausibly positive at a 95% confidence level. We can also see that the p-value is very small at 0.0000006 which means we can reject the null hypothesis that the slope is 0 for the alternate hypothesis that it is non-zero and positive.
The next research question we want to explore is: Is the transformed cumulative cases significantly related to the Stringency Index given nothing in the model? From the ANOVA table, there is sufficient evidence (F=8.873995 , P<0.01) to conclude that the Stringency Index is significantly related to the transformed cumulative cases given nothing in the model.
Table 5. ANOVA table for the transformed model| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| Stringency_Index | 1 | 2.563214 | 2.5632141 | 8.873995 | 0.0033322 |
| Economic_Support_Index | 1 | 7.848528 | 7.8485278 | 27.172056 | 0.0000006 |
| Residuals | 164 | 47.370672 | 0.2888456 | NA | NA |
The 95% Prediction intervals for Stringency Index; for example, a country with a Stringency Index equals to 20, Economic Support Index equal to 50, and transformed cumulative confirmed cases per 10,000 equals to 1.2. The cumulative cases per 10,000 is predicted to be between 0.01378 and 199.3965.
It is similar to other Stringency indices 50,70,90 in the prediction intervals table. In other words, any country with Stringency as 50,70 and 90 (and Economic Support Index equal to 50, transformed cumulative confirmed cases per 10,000 equals to 1.2), the cumulative cases are predicted between the lower and upper band in table 6.
Table 6. The 95% Prediction intervals where Stringency Index = 20, 50, 70, 90, respectively, for transformed cumulative confirmed cases per 10,000 = 1.2, and economic support index = 50.
| SI | Point Estimate | Lower Limit | Upper Limit |
|---|---|---|---|
| 20 | 10.70778 | 0.01378 | 199.3965 |
| 50 | 20.68657 | 0.10941 | 288.2952 |
| 70 | 30.82751 | 0.29381 | 369.6069 |
| 90 | 44.71648 | 0.65209 | 474.4893 |
The 95% Prediction intervals for Economic Support Index; for example, a country with a Stringency Index equals 75, Economic Support Index equal to 25, and transformed cumulative confirmed cases per 10,000 equals to 1.2. The cumulative cases per 10,000 is predicted to be between 0.08134 and 272.0472.
It is similar to other Economic Support indices 50,70,100 in the prediction intervals table. In other words, any country with Economic Support as 50,75 and 100 (and Stringency Index equals 75, transformed cumulative confirmed cases per 10,000 equals to 1.2), the cumulative cases are predicted between the lower and upper band in the table 7.
Table 7. The 95% Prediction intervals where Economic Support Index = 25, 50, 75, 100, respectively, for transformed cumulative confirmed cases per 10,000 = 1.2, and Stringency index = 75.
| ESI | Point Estimate | Lower Limit | Upper Limit |
|---|---|---|---|
| 25 | 18.65860 | 0.08134 | 272.0472 |
| 50 | 33.91222 | 0.36412 | 393.3872 |
| 75 | 58.12835 | 1.14900 | 560.8008 |
| 100 | 94.95592 | 2.90336 | 789.0344 |
Our analysis shows that there seems to be some relationship between the total confirmed cases per 10,000 and the Stringency and Economic Support Indices of a country measured with a time-lag of 130 days. We see evidence to suggest that CCC is positively correlated with Stringency and Economic Support Index (in that specific order added to the model), which aligns with our expectation, for it is reasonable for a government to respond strictly and spend more budget on income support packages if their people are more impacted by the pandemic.
The sample was not properly adjusted to account for the missing countries. Even though it represents approximately 85% of all the countries in the World, it fails to represent groups of countries properly, for example by continent or socio-economic regions. Also, the model was not validated using another sample, so its adequacy can also be questioned.
A common limitation when it comes to unorganized data is the way in which the data is recorded and categorized. The data sets used in this study for the confirmed cases of COVID-19 and the indices do not include data for every country registered in the World Bank (the population data set has countries only registered in the World Bank). For example, the Maldives does not have an entry in the data set for the Stringency and Economic Support indices, when it does have an entry in the other two data sets. Using data that was not gathered for the specific purpose of this study is a limitation since inconsistencies such as these are inevitable.
Other non-linear models, such as higher degree polynomial regression models, were considered. It was decided to go with the simpler model to avoid overfitting the data and avoid unnecessarily over complicating the analysis.
The model can be greatly improved and become more helpful if other predictor variables are added to it. Variables that were not used to calculate the Stringency and Economic Support Indices could be looked into, since they will probably not have a strong correlation with the indices already in the model. Moreover, a step function can be explored due to the natural breaks in the Economic Support Index, and polynomial regression models of higher degree can be explored to try and explain more variability in the data.
Lastly, the change in the cumulative number of confirmed cases per 10,000, from the day the response was implemented to the most recent date, can be looked at instead of the cumulative number of confirmed cases per 10,000 upto the most recent date.
“Total Population” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases↩︎
“OxCGRT_CSV” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/oxford-covid-19-government-response-tracker↩︎
“time_series_covid19_confirmed_global.csv” Novel Coronavirus (COVID-19) Cases Data. COVID-19 Pandemic. Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE). United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases↩︎
“Methodology for calculating indices” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic, Index methodology version 3.1. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 25 May 2020. Accessed October 2020 https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/index_methodology.md↩︎
“What’s Changed?” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 28 April 2020. Accessed October 2020 https://www.bsg.ox.ac.uk/sites/default/files/OxCGRT.%20What%27s%20changed%2024%20April%202020.pdf↩︎
“Income support during the COVID-19 pandemic” Coronavirus pandemic, Our World in Data, 2020. Accessed October 2020 https://ourworldindata.org/grapher/income-support-covid?time=2020-06-19↩︎
“Dept or contract relief during the COVID-19 pandemic” Coronavirus pandemic, Our World in Data, 2020. Accessed October 2020 https://ourworldindata.org/grapher/debt-relief-covid?time=2020-06-26↩︎
“International travel controls during the COVID-19 pandemic” Coronavirus pandemic, Our World in Data, 2020. Accessed October 2020 https://ourworldindata.org/grapher/international-travel-covid?time=2020-06-23↩︎
“How many Countries are there in the World?”, Worldometer, 2020. Accessed October 2020 https://www.worldometers.info/geography/how-many-countries-are-there-in-the-world/↩︎